On the Joint Decoding of LDPC Codes and 
Finite-State Channels via Linear Programming 

Byung-Hak Kim and Henry D. Pfister 

Department of Electrical and Computer Engineering, Texas A&M University 

Email: {bhkim,hpfister} @tamu.edu 



o 






> 

in 

o 
p 

in 

o 
o 



X 



Abstract — In this paper, the Hnear programming (LP) decoder 
for binary linear codes, introduced by Feldman, et al. is 
extended to joint-decoding of binary-input flnite-state channels. 
In particular, we provide a rigorous definition of LP joint- 
decoding pseudo-codewords (JD-PCWs) that enables evaluation 
of the pairwise error probability between codewords and JD- 
PCWs. This leads naturally to a provable upper bound on 
decoder failure probability. If the channel is a flnite-state 
intersymbol interference channel, then the LP joint decoder 
also has the maximum-likelihood (ML) certiflcate property and 
all integer valued solutions are codewords. In this case, the 
performance loss relative to ML decoding can be explained 
completely by fractional valued JD-PCWs. 



I. Introduction 



A. Motivation 



Message-passing iterative decoding has been a very pop- 
ular decoding algorithm in research and practice for the past 
fifteen years fTl. In the last five years, linear programming 
(LP) decoding has been a popular topic in coding theory and 
has given new insight into the analysis of iterative decoding 
algorithms and their modes of failure fSl fSl pl . For both 
decoders, fractional vectors, known as pseudo-codewords 
(PCWs), play an important role in the performance character- 
ization of these decoders f3l f5 1 . This is in contrast to classical 
coding theory where the performance of most decoding 
algorithms (e.g., maximum-likelihood (ML) decoding) is 
completely characterized by the set of codewords. 

For channels with memory, such as finite-state channels 
(FSCs), the situation is a bit more complicated. In the past, 
one typically separated channel decoding (i.e., estimating 
the channel inputs from the channel outputs) from error- 
correcting code (ECC) decoding (i.e., estimating the trans- 
mitted codeword from estimates of the channel inputs) fE\. 
The advent of message-passing iterative decoding enabled 
the joint-decoding (JD) of the channel and code by iterating 
between these two decoders jT). 

In this paper, we extend the LP decoder to the JD of binary- 
input FSCs and define LP joint-decoding pseudo-codewords 
(JD-PCWs). This leads naturally to a provable upper bound 
(e.g., a union bound) on the probability of decoder failure as 
a sum over all codewords and JD-PCWs. This extension has 
been considered as a challenging open problem in the prior 
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work mim. The problem is well posed by Feldman in his 
PhD thesis lH Section 9.5 page 146], 

"In practice, channels are generally not memory- 
less due to physical effects in the communication 
channel." ... "Even coming up with a proper linear 
cost function for an LP to use in these channels is 
an interesting question. The notions of pseudocode- 
word and fractional distance would also need to be 
reconsidered for this setting." 
Other than providing satisfying answer to the above open 
question, our primary motivation is the prediction of the 
error rate for joint decoding at high SNR. The idea is to 
run a simulation at low SNR and keep track of all observed 
codeword and pseudo-codeword errors. A truncated union 
bound is computed by summing over all observed errors 
and the result is an estimate of the error rate at high SNR. 
Computing this bound is complicated by the fact that the loss 
of channel symmetry implies that the dominant PCWs may 
depend on the transmitted sequence. 

While we were preparing this manuscript, we became 
aware of a more general approach by Flanagan ll9l lfT0l . In 
fact, our LP formulation was developed independently but is 
identical to his "Efficient LP relaxation". Our motivation, 
however, is somewhat different. The main goal is to use 
the error rate of joint LP decoding as a tool to analyze 
joint iterative decoding of FSCs and low-density parity-check 
(LDPC) codes. Thus, we give novel prediction results in 
Sec. |Vl We also observe that both formulations provide an 
ML edge-path certificate that is not equivalent to an ML 
codeword certificate (see Remark [T] and |2]i. This property is 
not guaranteed by Wadayama's approach based on quadratic 
programming |8|. 

The paper is structured as follows. After briefly reviewing 
LP decoding and FSCs in the remainder of Sec.|I] we describe 
the LP joint decoder in Sec. HI] and define JD-PCWs in Sec. 
Hm In Sec. IIVI we discuss the decoder performance analysis 
via the union bound (and pairwise error probability) over 
JD-PCWs and notions of generalized Euclidean distance. 
Experimental results are given in Sec. [V] and conclusions 
are given in Sec. [Vll 

B. Background 

Feldman, et al. introduced the LP decoding for binary 
linear codes in 131 Q. It is is based on solving an LP 
relaxation of an integer program which is equivalent to 
ML decoding. Later this method was extended to codes 



over larger alphabets ifTTl and to the simplified decoding 
of intersymbol interference (ISI) |T2l. For long codes, the 
performance of LP decoding is slightly inferior to iterative 
decoding but, unlike the iterative decoder, the LP decoder 
either detects a failure or outputs a codeword which is 
guaranteed to be the ML codeword. 

Let C C {0, 1}" be the length-n binary linear code defined 
by the parity-check matrix H and c — (ci,...,c„) be a 
codeword. If X is the set whose elements are the sets of 
indices involved in each parity check, then we have 
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The codeword poly tope is the convex hull of C This polytope 
can be quite complicated to describe though, so instead one 
constructs a simpler polytope using local constraints. Each 
parity-check I e I defines a local constraint that can also be 
viewed as a polytope in [0,1]". 

Definition 1: The local codeword polytope LCP(/) asso- 
ciated with a parity check is the convex hull of the bit 
sequences that satisfy the check. It is given explicitly by 



LCP(/)^f| <^ce[0, 1]" 
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Definition 2: The relaxed polytope V{H) is the intersec- 
tion of the LCPs over all checks, so 



V{H) ^ Pi LCP(/). 



/ex 



Theorem 1 (^): Consider n consecutive uses of a sym- 
metric channel Pr (F = y |C = c). If a uniform random code- 
word is transmitted and y = (j/i, . . . ,?/„) is received, then 
the LP decoder outputs f = (/i, . . . , /„) given by 



argmm2_^/jlog 



fev{H) 



\Pr(Y, = y,\a^l] 



which is the ML solution if f is integral (i.e., f G {0, 1}"). 

Definition 3: An LP decoding pseudo-codeword (LPD- 
PCW) of a code defined by the parity-check matrix H is 
any vertex of the relaxed (fundamental) polytope V{H). 

Definition 4: A finite-state channel (FSC) defines a prob- 
abilistic mapping from a sequence of inputs to a sequence 
of outputs. Each output Yi ey depends only on the current 
input Xi G X and channel state Si G S instead of the entire 
history of inputs and channel states. Mathematically, we have 
P {y, s'\x, s) = Pr {Yi=y, Si+i = s'\Xi = x, Si = s) for all i, 
and we use the shorthand P (y", S2^^|a;", sij for 

Pr {vr^yW S!,'+'=.s^+'\X^ = x^l,Si^si) 

n 

= WP{yi,Si+i\x„Si). 

i=l 

Definition 5: A finite-state intersymbol interference chan- 
nel (FSISIC) is a FSC whose next state is a deterministic 
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Figure 1. State diagrams for noiseless dicode cliannel witli and witliout 
preceding. Tlie edges are labeled by the input/output pair. 

function, ri{x^ s), of the current state s and input x. Mathe- 
matically, this implies that 



yey 



P(y,s\x,s) 



1 ifr]{x, s) - 
otherwise 



Definition 6: The dicode channel (DIC) is a binary-input 
FSISI channel with a linear response of G{z) = 1 — z^^ and 
Gaussian noise. If the input bits are differentially encoded 
prior to transmission, then the resulting channel is called the 
precoded dicode channel (pDIC). The state diagrams of these 
two channels are shown in Fig. [T| 

II. New Results: LP Joint-Decoding 

Now, we describe the LP joint decoder in terms of the 
trellis of the FSC and the checks in the binary linear code. Let 
n be the length of the code and y be the received sequence. 
The trellis consists of [n + 1)|5| vertices (i.e., one for each 
state and time) and a set £ of at most 2n|iSp edges (not 
2n|5|, i.e., one edge for each input-labeled state transition 
and time). For each edge e G f, the functions i(e) ^• 
{!,..., n}, s(e) -^ S, s'{e) -> S, x{e) -^ {0,1}, and 
a(e) —^ A map this edge to its respective time index, initial 
state, final state, input bit, and noiseless output symbol. 
The LP formulation requires one variable for each edge 
e £ £, and we denote that variable by g{e). Likewise, the 
LP decoder requires one cost variable for each edge and we 
use the branch metric 
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l-ln(F(y,(,),s'(e)|a:(e),s(e))P(,s(e))) ifi(e) = 1 



Definition 7: The trellis polytope T enforces the flow con- 
servation constraints for channel decoder The flow constraint 
for state j at time i is given by 



•^y = < 3(-) e [0, 1] 



1^1 



>; 


9{e) 


->; 


ff(e) 


ee£: 




ee£: 




t{e)=t, 




t(e)=i+l 




s'{e)=j 




^(e)=J 





> . 



Using this, the trellis polytope T is given by 
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Theorem 2 ([2]): Finding the ML edge-path through a 
weighted trelUs is equivalent to solving the minimum-cost 
flow LP 

argminy^ (7(e)&(e) 

and the optimum g(-) must be integral (i.e., g(-) G {0, 1} ) 
unless there are ties. 

Definition 8: Let Q be the projection of g(-) onto the input 
vector f — (/i, . . . , /„) e [0, 1]" where f = Qg and 

/. = E die). 

e^E: t{e)—i, x{e) — l 

Definition 9: The trellis-wise relaxed polytope P-j-{H) for 
P{H) is defined by 

PTiH)^{gi-)eT\Qger{H)}. 

Definition 10: The set of trellis-wise codewords Cj- for C 
is defined as 

Cr = {.9(-)ePrW|.9(-)G{0,l}l^l}. 
Theorem 3: The LP joint decoder computes 
argmin y^g(e)6(e) 

9(-)67'r(i/)e6£ 

and outputs a joint ML edge-path if g{-) is integral. 

Proof: Let V be the set of valid input/state sequence 
pairs. For a given y, the ML edge-path decoder computes 

argmax P(yi", 4+i|4, si)P(si) 
= argmaxP(si) TT P (y^^), s'{e)\x{e), s{e)) 



arg mm 



ee£:g{e) = l 
e££:g(e) = l 



= argmin > g{e)b{e), 

where ties are resolved in a systematic manner and b{e) at 
i(e) = 1 has an additional initial state term as -lnP(s(e)). 
By relaxing Cj- into P-y-{H), we obtain the desired result. ■ 
Corollary 1: For a FSISIC, the LP joint decoder outputs 
a joint ML codeword if g(-) is integral. 

Proof: The joint ML decoder for codewords computes 

argmax E ^(y", 4+Vi , ^1)^(^1) 

n 

= argmax E J^ -P(yi, Si+i^i, Si)P(si) 



argmaxTTp(2/i,?7(a;i,Si) |a;i,Si) P(si) 



(a) 

= argminE3(e)&(e), 



Remark 1: If the channel is not a FSISIC (e.g., finite- 
state fading channel), the integer valued solutions of the LP 
joint-decoder are ML edge-paths and not necessarily ML 
codewords. This occurs because the decoder is unable to sum 
to the probability of the multiple edge-paths associated with 
the same codeword (e.g., if multiple distinct edge-paths are 
associated with the same input labels). 

III. Joint-Decoding Pseudo-codewords 

Pseudo-codewords have been observed and given names 
by a number of authors 1 1 11 13 1| 14J, but the simplest general 
definition was provided by Feldman, et al. in the context of 
LP decoding of parity-check codes |T|. One nice property of 
the LP decoder is that it always returns an integer codeword 
or a fractional pseudo-codeword. Vontobel and Koetter have 
shown that a very similar set of pseudo-codewords also 
affect message-passing decoders, and that they are essen- 
tially fractional codewords that cannot be distinguished from 
codewords using only local constraints fSl. We define JD- 
PCW this section because of their primary importance in the 
characterization of code performance at very low eiTor rates. 

Definition 11: The output of the LP joint decoder is a 
trellis-wise (ML) codeword (TCW) if g{e) £ {0, 1} for all 
e € E. Otherwise, if g{e) 6 (0, 1) for some e G f , then 
the solution is called a joint-decoding trellis-wise pseudo- 
codeword (JD-TPCW) and the decoder outputs "failure". 

Definition 12: Any TCW g can be projected onto a 
(symbol-wise) codeword (SCW) f — Qg. Likewise, any JD- 
TPCW g can be projected onto a joint-decoding symbolwise 
pseudo-codeword (JD-SPCW) f = Qg. 

Remark 2: For FSISIC, the LP joint decoder has the ML 
certificate property; if the decoder outputs a SCW, then it is 
guaranteed to be the ML codeword (see Cor. [T]!. 

Definition 13: Any TCW can be projected onto a symbol- 
wise signal-space codeword (SSCW) and any JD-TPCW g 
can be projected onto a joint-decoding symbol-wise signal- 
space pseudo-codeword (JD-SSPCW) p = (pi, . . . ,p„) by 
averaging the components with 



Pi = E .9(e)a(e)- 



where (a) follows from Defn. |5] and (6) holds because each 
input sequence defines a unique edge-path. Therefore, the LP 
joint-decoder outputs an ML codeword if g{-) is integral. ■ 



e(^£: t{e)—i 

Example 1: Consider the single parity-check code 
SPC(3,2). Over precoded dicode channel (starts in zero 
state) with AWGN, this code has five joint-decoding pseudo- 
codewords. A simulation was performed for joint-decoding 
of the SPC(3,2) on the pDIC treflis and the set of JD-TPCW, 
by ordering the trellis edges appropriately, was found to be 

{(0100;0 0.5.5;0.5.5 0),(.5.500;.500.5;0100), 
(.5 .5 0; .5 .5 0; 1 0),(1 0; .5 .5 0; .5 .5 0), 
(.5.500;.5000;0.5.50)}. 

Using Q to project them into V{H), we get the corresponding 
set of JD-SPCW 

{(1,.5,.5), (.5, .5,1), (.5, .5,0), (0,.5,.5), (.5,0, .5)}. 



IV. Union Bound for LP Joint-Decoding 

Now that we have defined the relevant pseudo-codewords, 
we turn our attentions to the question of "how bad" a certain 
pseudo-codeword is, i.e., we want to quantify pairwise error 
probabilities. In fact, we will use the insights gained in the 
previous section to obtain a union bound on the decoder word 
error probability (as a tight approximation) to analyze the 
performance of the proposed LP-joint decoder. Toward this 
end, let's consider the pairwise error event between a SSCW 
c and a JD-SSPCW p first. 

Theorem 4: A necessary and sufficient condition for the 
pairwise decoding error between a SSCW c and a JD-SSPCW 
p is 

Y.9{e)h{e)<Y,~g{e)Ke). 



e£S 



ee£ 



where g(-) G Vt{H) and g{-) G Cj- are the LP variables for 
p and c respectively. 

For the moment, let c be the SSCW of FSISIC to an AWGN 
channel whose output sequence is y = c + v, where v — 
{vi, . . . ,Vn) is an i.i.d. Gaussian sequence with mean and 
variance a^. We will show that each pairwise probability 
has a simple closed-form expression that depends only on a 
generalized squared Euclidean distance (ii^^ (c, p) and the 
noise variance a^ . The next few definitions and theorems can 
be seen as a generalization of |15| and a special case of the 
more general formulation in ifTOl . 

Theorem 5: Let y be the output of a FSISIC with zero- 
mean AWGN whose variance is a^ per output. Then, the LP 
joint decoder is equivalent to 

argmin V.g(e) (^^(e) - a(e)) . 

Proof: For each edge e E £, the output 2/i(e) is 
Gaussian with mean a(e) and variance a^, so we have 
P {yt{e),s'{e)\x{e),s{e)) ^ 7V(a(e),CT^). Therefore, the 
LP joint-decoder computes 

argmin y2g{e)b{e)= argmin V g(e) (j/t(e) -a(e)) . 
9(-)ePr(i^),g£ s(-)ePr(i/),g£ 

■ 
Definition 14: Let c be a SSCW and p a JD-SSPCW. Then 
the generalized squared Euclidean distance between c and p 
can be defined in terms of their trellis-wise descriptions by 

'^gen V"-J t' J ,, ,.2 

U 



where 



\df^J2(^^-P^^"^''l = T.9ie)a'ie)-Y.P^ 



ee£ 



Theorem 6: The pairwise error probability between a 
SSCW c and a JD-SSPCW p is 



Pr (c ^ p) = Q 



dgen (C, p) 
2cr 



Proof: The pairwise error probability Pr (c — > p) that 
the LP joint-decoder will choose the pseudo-codeword p over 
c can be written as Pr (c ^> p) = 

{n n 

Y^ Y^ ^(e) {yt{e) - a{e)f < ^ (y^ - Cif 
t{e)=i 

= Pr { E. m ic^ - p.) < \ (E. c? - Eee. 9(e)aHe)) ] 
(a) Q I E. c. (q -p^)-h (E. c? - Eee£ 5(e)a2(e)) ' 



'^Q 



l|d|| +(^l\ (c) fdgenic, p) 



2a||d|| 
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where (a) follows from the fact that J2i Hi (cj — Pi) is dis- 
tributed A/'(EiCi(c, -p,), E.(ci -pO^) and (6)/(c) fol- 
low from Defn. [14] ■ 
Wiberg was the first to define a generalization of the Eu- 
clidean distance to explain errors caused by iterative decoding 
[Xj and this was extended to non-binary cases 1.15.1 where 
Pr (c — >■ p) looks very similar to Thm.|6] The main difference 
is the definition of the trellis-wise approach used for JD- 
TPCW. 

The performance degradation of LP decoding relative to 
ML decoding can be explained by pseudo-codewords and 
their contribution to the error rate depends on dg^n (c, p) . 
Indeed, by defining Kd ^„ (c) as the number of codewords 
and JD-PCWs at distance dgen from c and Q{c) as the set 
of generalized Euclidean distances, we can write the union 
bound on word error rate (WER) as 

Of course, we need the set of JD-TPCWs to compute 
Pr (c — > p) with the Thm. |6] There are two complications 

with this approach. One is that like original problem [21, 
no method is known yet for computing the generalized Eu- 
clidean distance spectrum, apart from going through all error 
events explicitly. Another is, unlike original problem, the 
constraint polytope may not be symmetric under codeword 
exchange. Therefore the decoder performance may not be 
symmetric under codeword exchange. Hence, the decoder 
performance may depend on the transmitted codeword. In 
this case, the pseudo-codewords will also depend on the 
transmitted sequence. 

V. Simulation Results and Error Rate Prediction 

In this section, we present simulation results for two LDPC 
codes on the precoded dicode channel (pDIC) and use those 
results to predict the error rate well beyond the limits of 
our simulations. Both codes are (3, 5)-regular binary LDPC 
codes; the first has length 155 and the second has length 
455. The parity-check matrices were chosen randomly except 
that double-edges and four-cycles were avoided. Since the 
performance depends on the transmitted codeword, the results 
were obtained for 3 randomly chosen codewords of fixed 
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Figure 2. This figure sliows comparison between the LP joint decoding and joint iterative message-passing decoding on the precoded dicode channel with 
AWGN for random (3,5) regular LDPC codes of length n = 155 (left) and n = 450 (right). The curves shown are the LP-JD WER (solid), LP-JD WER 
prediction (dashed) and JIMPD WER (dash-dot). The experiments were repeated for three different non-zero codewords in each case. The dashed curves 
are computed using the union bound in Sec. IIVI based on JD-PCWs observed at 3.4558 dB (left) 2.6696 dB (right) and the dash-dot curves are obtained 
using the state-based joint iterative message-passing decoder (JIMPD) described in |16:|. Note that SNR is defined as channel output power divided by cr^. 



weight. The weight was chosen to be roughly half the block 
length, giving weight 74 in the first case and 226 in the 
second case. 

The results are shown in Fig. |2l The solid lines represent 
the simulation curves while the dashed lines represent a trun- 
cated union bound. The truncated union bound is obtained 
by computing the generalized Euclidean distances associated 
with all decoding errors that occurred at some low SNR 
points (e.g., WER of roughly than 10^^) until we observe 
a stationary generalized Euclidean distance spectrum. This 
high WER allows the decoder to rapidly discover JD-PCWs. 
The dash-dot curves show the state-based joint iterative 
message-passing decoder (JIMPD) algorithm described in 
lfT6l . Somewhat surprisingly, we find that LP joint-decoding 
outperforms JIMPD by about 0.5dB at WER of 10'^. 

The LP decoding is performed in the dual domain because 
this is much faster than the primal when using MATLAB. 
Due to the slow speed of LP decoding still, simulations were 
completed up to a WER of roughly lO^'^. It is well-known 
that the truncated bound should be relatively tight at high 
SNR if all the dominant JD-PCWs have been found. 

The final complication that must be discussed is the 
dependence on the transmitted codeword. It is known that 
long LDPC codes with joint iterative decoding experience a 
concentration phenomenon 1161 whereby the error probability 
associated with transmitting a randomly chosen codeword 
is very close, with high probability, to the average error 
probability over all transmitted codewords. We note that this 
effect starts to appear even at the short block lengths used 
in this example. More research is required to understand this 
effect at moderate block lengths and to verify the same effect 
for LP decoding. 

VI. Conclusions 

In this paper, we present a novel linear-programing (LP) 

formulation of joint decoding for LDPC codes on FSCs that 
offer decoding performance improvements over joint iterative 
decoding. Joint-decoding pseudo-codewords (JD-PCWs) are 
also defined and the decoder error rate is upper bounded 
by a union bound sum over JD-PCWs. Finally, we propose 



a simulation-based semi-analytic method for estimating the 
error rate of LDPC codes on FSISIC at high SNR using only 
simulations at low SNR. 
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